85 research outputs found

    Iterative Geometry-Aware Cross Guidance Network for Stereo Image Inpainting

    Full text link
    Currently, single image inpainting has achieved promising results based on deep convolutional neural networks. However, inpainting on stereo images with missing regions has not been explored thoroughly, which is also a significant but different problem. One crucial requirement for stereo image inpainting is stereo consistency. To achieve it, we propose an Iterative Geometry-Aware Cross Guidance Network (IGGNet). The IGGNet contains two key ingredients, i.e., a Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG) strategy. The GAA module relies on the epipolar geometry cues and learns the geometry-aware guidance from one view to another, which is beneficial to make the corresponding regions in two views consistent. However, learning guidance from co-existing missing regions is challenging. To address this issue, the ICG strategy is proposed, which can alternately narrow down the missing regions of the two views in an iterative manner. Experimental results demonstrate that our proposed network outperforms the latest stereo image inpainting model and state-of-the-art single image inpainting models.Comment: Accepted by IJCAI 202

    Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation

    Full text link
    Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. Concretely, we formulate cross-domain image translation as a denoising diffusion process and utilize a novel Semantic Gradient Guidance (SGG) method to constrain the translation process, conditioning it on the pixel-wise source labels. Additionally, a Progressive Translation Learning (PTL) strategy is devised to enable the SGG method to work reliably across domains with large gaps. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods.Comment: Accepted to ICCV202

    Long-Term Anticipation of Activities with Cycle Consistency

    Full text link
    With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads.Comment: GCPR 202

    Learning Latent Global Network for Skeleton-based Action Prediction

    Get PDF
    Human actions represented with 3D skeleton sequences are robust to clustered backgrounds and illumination changes. In this paper, we investigate skeleton-based action prediction, which aims to recognize an action from a partial skeleton sequence that contains incomplete action information. We propose a new Latent Global Network based on adversarial learning for action prediction. We demonstrate that the proposed network provides latent long-term global information that is complementary to the local action information of the partial sequences and helps improve action prediction. We show that action prediction can be improved by combining the latent global information with the local action information. We test the proposed method on three challenging skeleton datasets and report state-of-the-art performance

    ERA: Expert Retrieval and Assembly for Early Action Prediction

    Full text link
    Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance.Comment: Accepted to ECCV 202

    DiffPose: Toward More Reliable 3D Pose Estimation

    Full text link
    Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose to facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP. Project page: https://gongjia0208.github.io/Diffpose/.Comment: Accepted to CVPR 202

    Meta Agent Teaming Active Learning for Pose Estimation

    Get PDF
    The existing pose estimation approaches often require a large number of annotated images to attain good estimation performance, which are laborious to acquire. To reduce the human efforts on pose annotations, we propose a novel Meta Agent Teaming Active Learning (MATAL) framework to actively select and label informative images for effective learning. Our MATAL formulates the image selection procedure as a Markov Decision Process and learns an optimal sampling policy that directly maximizes the performance of the pose estimator based on the reward. Our framework consists of a novel state-action representation as well as a multi-agent team to enable batch sampling in the active learning procedure. The framework could be effectively optimized via Meta-Optimization to accelerate the adaptation to the gradually expanded labeled data during deployment. Finally, we show experimental results on both human hand and body pose estimation benchmark datasets and demonstrate that our method significantly outperforms all baselines continuously under the same amount of annotation budget. Moreover, to obtain similar pose estimation accuracy, our MATAL framework can save around 40% labeling efforts on average compared to state-of-the-art active learning frameworks
    • …
    corecore